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We extend the usual logistic model between a dichotomous phenotype and an allele count 
in two ways: a polytonnous phenotype with K > 2 levels, and modeling of allele counts at 
two unlinked marker loci. Inference is based on within-family information to guard against 
potential bias due to population genetic structure. Score tests of the model coefficients 
taking into account the correlation between relatives in entire pedigrees are derived as 
an extension of the Generalized Disequilibrium Test (GDT). Simulations confirm that the 
tests have the expected statistical properties, and that their power exceeds that of the 
GDT under a favorable scenario. The score tests are illustrated with candidate genetic 
markers, a major psychosis phenotype and a cognitive endophenotype in large kindreds 
from Eastern Quebec. 



Keywords: conditional lil<eliliood, endoplienotype, family-based association, 
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kinsliip, major psychosis. 



1. INTRODUCTION 

Studies of the association between a phenotype and genetic 
markers are commonly performed on the members of fam- 
iHes of various sizes. While methods to estimate association 
parameters and test the null hypothesis of absence of associ- 
ation (possibly coupled with absence of genetic linkage) with 
dichotomous phenotypes in family samples are well devel- 
oped (see for instance chapter 12 of Ziegler and Konig, 
2010), methods are lacking to analyze polytomous pheno- 
types. Such phenotypes can arise when a disease has multi- 
ple subtypes (Guey et al., 2010) or when two dichotomous 
phenotypes are considered simultaneously. The latter occurs 
when endophenotypes are measured in genetic studies to bet- 
ter capture phenotypic complexity. Endophenotypes are traits 
related to a disease and believed to be influenced by fewer 
genes (Gottesman and Gould, 2003). A dichotomous disease 
status and a dichotomous endophenotype create a four cate- 
gory phenotype. Comparisons between analyzing a polytomous 
phenotype vs. a dichotomous one have not been done for fam- 
ily studies due to the lack of analysis methods for polytomous 
phenotypes. 

We focus in this paper on a within-family analysis, con- 
ditional on phenotype and genotype observed in each family. 
Such approach is well known to protect against confounding 
due to population stratification. Families where multiple pheno- 
typic categories are represented provide the most information on 
the relationship between a polytomous phenotype and genetic 
markers. Since families extending over multiple generations typ- 
ically need to be recruited to obtain a large number of pheno- 
typed subjects, we required that the methods for dichotomous 
traits that we generalize to polytomous traits be applicable to 



extended families. For a score test of association, we selected the 
Generalized disequilibrium test (GDT) of Chen et al. (2009). 

In previous work, we showed by simulation that condition- 
ing on a marker at a known disease susceptibility locus increased 
power to detect linkage to new loci interacting with that disease 
susceptibility locus (Bureau et al., 2009, 2012). Similar power 
gains are expected in association analysis, as conditioning on 
a known environmental risk factor increases power to detect 
loci interacting with the exposure (Kraft et al, 2007). Models 
involving genetic markers at two distinct loci are needed for anal- 
yses conditional on the genotype of known disease susceptibility 
markers and also to model the relationship between pairs of loci. 
Multi-category phenotypes present a larger realm of possibili- 
ties of interplay between multiple loci than dichotomous traits, 
making multilocus modeling even more important to capture the 
actual effects. This is why we derive score tests under two-locus 
models, with one marker at each locus, in addition to one-locus 
models. The Type I error and the power of tests of various combi- 
nations of regression coefficients are assessed using simulation. 
The tests are also illustrated with candidate genetic markers, a 
major psychosis phenotype and a cognitive endophenotype in the 
Eastern Quebec kindred study. 

2. METHODS 

We extend the GDT of Chen et al. (2009) in two ways: by allow- 
ing the outcome Y to have K > 2 levels, and by allowing the odds 
of the outcome categories to depend on two or more variables X, 
coding the genotype of markers at two mutually unlinked marker 
loci. As in the original GDT, X represents the count of a particular 
form of the DNA sequence at the marker, called allele. We begin 
by deriving the score statistic from the conditional likelihood 
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for a polytomous outcome Y with a general vector X of allelic 
count terms (possibly including product terms). Then we derive 
expressions for particular forms of terms in X. 

The polytomous model for subject with a general Jf, vector 
can be written 



log 



pik + P'kXk„k=l,--- ,K-1 (1) 



where X^i is the sub-vector of X, containing the allelic terms 
related to level (category) k and fit the sub-vector of the full coef- 
ficient vector jS applicable to level k (in this general formulation, 
f} coefficients can either be distinct for each level k or can be 
common to multiple levels of k). 

Without loss of generality, we assume that the n genotyped 
pedigree members with an observed phenotype are ordered such 
that the first «i subjects are in outcome category 7=1, the «2 
following subjects are in outcome category Y = 2 and so on up to 
the last riK subjects with Y = K. 

With K = 2 and a single X {Pi = f) a scalar, without covari- 
ates), Chen et al. (2009) showed that the contribution of the 
family to the score statistic from the conditional likelihood P to 
test the nuU hypothesis P = 0 has the form: 



cGDT 



giogPi 

dfi 1/3 = 0 



«1 n 

, = lj=„l + l 



(X,-Xj) 



(2) 



We show in Supplementary Material that the contribution of a 
family to the score statistic for the coefficient fif, component of /3 
when testing the global null hypothesis that the fuU ^ = 0 under 
a polytomous model is: 



1 



aiogPi _ 

9/J/, 1^=0 n 



i"=ij=«i+i 

2^ V-i);j 



i = n-(nK-i + 1j€£k-1 



(3) 



where Ek-i = {I, ■ ■ ■ ,n — (tiK- i — Mic), n — hk + 1, • ■ ■ , «} 
and x'*"' is the slice of X^, related to the coefficient /j/,. If fih is 
involved only in the logistic function between levels a and K, then 
the score statistic simplifies to: 



c(;>) 



SlogP 



«i -I h Mfl} for a > 1 



where = {«! + ■ 
and£i = {1 • ■ ■ ni}. 

The advantage of expression 3 is that a closed-form expres- 
sion for the variance of S""^ and the covariance of S^^' and SC"' 
for coefficients fig and /3;, can be derived, following the steps of 
Chen et al. It is also easier to interpret. When the tested coefficient 
belongs to the logistic function attached to a single outcome cate- 
gory and the score statistic reduces to expression 4, it is a contrast 



of the value of the corresponding X term between subjects in the 
outcome category and subjects in all other categories. 

Letting v[S] be an estimate of the variance -covariance matrix 
of S, the nuU hypothesis that f) = 0 can then be tested with the 
statistic 

T = S'v[Sr^S 

which follows a distribution with degrees of freedom equal to 
the rank of j3 under the null. 

When testing the sub nuU hypothesis Phi = ■ ■ ■ = fih,,, = 0 for 
any subset of indices hi, ■ ■ ■ ,h„„ the other coefficients are free 
to differ from 0 and the derivation in Supplementary Material no 
longer applies. We adopt here the approach Chen et al. (2009) 
apply to model covariates, which is to weight the pairwise differ- 
ences according to a model of the outcome 7 as a function of the 
predictors with free coefficients under the null hypothesis. The 
score statistic for the component fii, of the subset of coefficients 
tested then becomes 



31ogP 

dPh 'Phi 



l€Eaj€E'„ 



X 



(h) 



X' 



?) (5) 



where the weights Cy can be derived from score equations for Ph 
under the pairwise formulation of Liang and Stewart (1987) (see 
Supplementary Material), leading to the following functions of 
the coefficients a of a polytomous logistic model of 7 as a func- 
tion of the predictors X^'^\ c = {I : I ^ (hi, ■ ■ ■ , hm)} when the 
variability from estimating the is neglected: 



N 



i + ..p{(x«-x<^))'(,,-^,))) 



(6) 



where uk = 0. 

Adapting Chen et al. (2009)'s Equation 2 from the dichoto- 
mous to the polytomous case gives the following expression for 
the weights instead: 



C„ 



N 



e.p{(xf)-x'-')X^,-^,)) 
h«p{(xW-J^'^))'(^y,-^,^.)j 



(7) 



We estimate the coefficients a using generalized estimating equa- 
tions (GEEs) with an independence working correlation matrix. 
With this approach the null hypothesis that the component 
Ph = 0 can be tested with the statistic 



2W = se-VVvLSC-)] 

which follows approximately a standard normal distribution 
under the null, when the weights are defined in such a way that 
the expectation of S^'"' is 0. The weight definition will only have an 
impact on power. The joint null hypothesis (S;,j = ■ ■ ■ = p^^^^ = 0 
for any subset of indices hi, ■ ■ ■ , h„, can be tested with the statistic 



T = (Sh, 



, ShJiv[Sh, 



.Sh,„]) (Sh, 



Sh„ 



(8) 
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which follows approximately a x ^ distribution with m degrees of 
freedom under the null. 

The variance of S'*"' depends on whether the null hypothesis 
refers only to absence of association, or to absence of genetic link- 
age and association. In the first case, the nuU distribution of S'*"' 
allows genetic linkage at the locus, and the identical-by-descent 
(IBD) sharing proportions in the variance estimate must be the 
actual IBD sharing proportions at the locus tt;,,-, (Chen et al., 
2009). For the second case, or when IBD is unknown, jihij can 
be substituted by twice the kinship coefficients 0y, which is con- 
stant at all loci. The general expression for the variance ofsW 
and covariance between S'''^ and S^s) 

is given in Supplementary 
Material. When S<''> takes the form 4, X^f is a main effect term, 
say Xi , and the actual IBD sharing proportions TC},ij are used then 



Vflr[S' 



Var 



(9) 



= i:QQ,Cbv[xlf-xJ\X 



ih) yih) _ y(h)^ 



= J2 QjCk,{Cov[Xu,Xik] + Cov[Xij,Xu] 

- Cov[Xi„Xi,]-Cov[Xy,Xu]) 

= ^ ^ CijCki (jTi,! -I- Tiiji — nui - jtijk) al 
i,k€Ej,laE^ 

The within-family variance of Xi, a^, is estimated as described 
in Supplementary Material to obtain the estimate v[S''''] of 
Var[S'''^]. With equal weights for all pairs, the computation 
involving the IBD sharing probabilities can be simplified as 
explained in Supplementary Material. 

When X*''' is instead a product term, say X1X2, then 



Var[S 



Var 



tsE j€E^ 



ih) 



■xC)) 

a; ' 



„2 ^ ^ ^ X/ 

>e£ ;€F k€E leE'^ 

/ Cov[Xi,X2„ XikX2k] + Cov[XijX2j, XuX2i] 

l,-C0V[Xi,X2„Xi,X2,] - C0V[Xi^X2;,Xi^X2it] 

~ ^ ^ ^ X/ ^ 

>e£ ;€F k€E IsE^ 

i^lik^lik + ^Ijl^ljl - ^lil^lil - ^\jk^2jk) (^12 

where the within-family variance of the product term X1X2, o'j^2) 
is estimated as described in Supplementary Material. 

2.1. APPLICATION TO THE JOINT MODELING OF TWO DICHOTOMOUS 
TRAITS USING TWO-LOCUS MODELS 

The joint analysis of two dichotomous traits represents an impor- 
tant special case of a polytomous phenotype with four categories. 



We illustrate such a phenotype by referring to a dichotomous dis- 
ease trait Y2 and a dichotomous endophenotype Yi , as defined in 
the introduction. 

We consider here polytomous models for two markers at 
unlinked loci which may interact to cause the disease and 
endophenotype impairment. We assume that association of locus 
1 to the endophenotype impairment i^i = 1 and possibly to the 
disease Y2 = 1 has already been established, and that we want 
to detect locus 2, which is undetectable in single-locus analyses, 
by conditioning on locus 1 with which it interacts. This leads to 
null hypotheses on a subset of coefficients tested with a statistic as 
defined in Equation 8. 

A first option is to use the full model with distinct coeffi- 
cients for each disease/endophenotype combination contrasted 
to the reference category of absence of both the disease and 
endophenotype impairment. This model is: 



log 



log 



log 



p[yi = i,y2 = o|Xi.X2] 

P[7i = 0,72 = 01X1, X2], 
= Pio + PiiXi + ^12X2 + /il3-'fl^2 
P[7i =0,72 = l|Xi,X2]^ 



(10) 



?[7i =0, 72 = 0|Xi,X2] 
= P20 + P21X1 + P22X-2 + P23X1X2 
P[7i = l,72 = l|Xi,X2]^ 



P[7i =0, 72 = 0|Xi,X2], 

= P30 + P31X1 + ^32X2 + /i33-'fl^2 



The null hypothesis of the conditional test of locus 2 given locus 
1 under the full model is formulated as: 



0 



(11) 



When the null is rejected, insights on the phenotype category 
driving the signal can be obtained by examining the Z statistics 
for each coefficient and the p-values associated to the tests of the 
subsets of coefficients (fiu, P\3),{h2, P23) and (^32, fts)- 

One can also postulate a model for a particular form of 
interaction between the two loci. We consider a model which 
we call the endophenotype-to-disease model where an allele at 
locus 1 increases susceptibility to the endophenotype impair- 
ment 7i = 1 and possibly to the disease 72 = 1, and an allele at 
locus 2 increases susceptibility to the disease in carriers of gene 
1 susceptibility genotypes (at higher risk of the endophenotype 
impairment). For that model we express allele counts as propor- 
tion of a given allele in a genotype, taking values 0, j and 1. The 
model is then written as: 



log 



log 



P[7i = l,72 = 0|Xi,X2] 
P[7i =0, 72 = 0|Xi,X2] 
= ^10 + ^1 1X1 + /3,Xi(l-X2) 

P[7i =0,72 = l|Xi,X2]^ 
P[7i =0, 72 = 0|Xi,X2] 

= P20 + P21X1 



(12) 
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log 



P[Yi = l,Y2 = l\Xi,X2] 
P[Yi=0,Y2 = 0\Xi,X2] 



We keep the same notation for the coefficients as in the full model, 
except for the coefficient which represents the effect on the risk 
of the endophenotype impairment in non-carriers of the locus 2 
tested allele. When the endophenotype-to-disease model holds, 
the coefficients (S33 and fie are of the same sign. The marginal 
association of X2 to the endophenotype impairment under that 
model will typically be small. Its direction and magnitude depend 
on the values of P33 and fi^ and the distribution of Xi . 

The nuU hypothesis of the conditional test of locus 2 given 
locus 1 under the above model is formulated as: 

fie = ^3 = 0 (13) 

The alternative hypothesis can be restricted to 

fie >0,fiii>OUfi,< 0, ;S33 < 0 

or a general alternative can be considered, but the alternative 
space then contains models outside of the conceptual model 
formulated above. 

Alternatively, detection of locus 2 can be attempted by test- 
ing a single interaction parameter between Xi and X2, as in 
the context of a genetic analysis conditional on an environmen- 
tal exposure (Kraft et al., 2007). Here the interaction parameter 
for the logistic function contrasting the disease and endophe- 
notype impairment category to the reference category fi^^ is 
the most promising to test to detect effects on the disease and 
endophenotype impairment jointly. 

2.2. SOFTWARE IMPLEMENTATION 

We have implemented the extension of the GDT to polyto- 
mous phenotypes and two loci in the R package f at2Lpoly, 
standing for Family-based Association Test for 2 Loci and 
Polytomous phenotypes available on the CRAN archive at 
CRAN.R-pro;ect.org/package=fat2Lpoly. A function is pro- 
vided to read phenotype and genotype data, variable names 
and IBD sharing proportions (if applicable) from input files 
in the Merlin/QTDT format (www.sph.umich.edu/csg/abecasis/ 
Merlin/tour/input_files.html) and convert them into R objects. 
Alternatively, R objects made by the user in the same for- 
mat can be provided as input. Functions are provided to 
setup design matrices for the full two-locus polytomous 
model, the one-locus polytomous model and the disease-to- 
endophenotype model. User-defined functions setting-up cus- 
tomized design matrices can be provided instead of these 
pre-defined functions. 

2.3. EVALUATION BY SIMULATION OF THE PROPOSED HYPOTHESIS 
TESTS UNDER TWO-LOCUS MODELS 

The family structure used in the simulations is a 3-generation 
16-member family depicted in Figure 1. The disease and 
endophenotype status of all family members was assumed to be 



001 



002 



6p 

003 007 



hrO 



004 005 



006 



008 



005 010 Oil 012 013 014 015 016 

I I Unaffected without endophenotype (Yj = 0,^2 = 0) 

P Unaffected with endophenotype (Y^ -1,^2- 0) 

H Affected without endophenotype (Yj = 0, Yj = 1) 

I Affected with endophenotype (Y^ = 1, Yj = 1) 

FIGURE 1 I Structure of simulated families with an example of 
phenotype realization. 



observed. We generated genotype data for genetic variants with 
two alleles such as single nucleotide polymorphisms (SNPs) at 
two independent loci. The genotypes of pedigree founders were 
sampled under Hardy- Weinberg equilibrium using risk allele 
frequencies (RAFs) of 0.1 at locus 1 and 0.3 at locus 2. The 
transmission of alleles to their descendants was then simulated 
following the rules of Mendelian inheritance. Two dichotomous 
phenotypes Yi and Y2 were generated in a two-step approach: we 
first simulated from the distribution of 7,i for each subject i by 
summing over 7,2 in a polytomous model, then from the distri- 
bution of the vector ^2 1 i^i . In the model to simulate \Yi, Yi is 
treated as a vector of fixed effect, with the effect of the endophe- 
notype of subject h, Y^i, modulated by the kinship coefficient (/),7, 
between i and h. An additive polygenic effect on the logit of Y2 
was also included. The model can be written: 

fP[Ya = l\YuX,U]\ V ^^^7 n^^ 

log = y (Xi, y,i) + Ui (14) 

^\P[Y,2 = 0\YuX,U]J 



-I- « ^ (Yhi - v)(t>ii 
[7~N(0, o-^O) 



(15) 



where y'iXi, Yn) in an abbreviated expression of the model for 
the disease phenotype given the genotype at major loci and 
endophenotype status of subject i derived from a polytomous 
model and $ is the kinship matrix between the famQy members. 
The parameter cr^ controls the degree of polygenic dependence 
between the disease status Y2 of the family members and the 
parameter a the degree of genetic dependance of Y2 on Yi not 
captured by the genotype at the loci in the model. The parameter 
V, between 0 and 1, determines the relative importance of the risk 
increase 1 — v due to observing an endophenotype impairment 
and the risk decrease — v due to observing the normal level of the 
endophenotype in a relative. We note this simulation scheme is 
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meant to reproduce the association between disease phenotype 
and endophenotype status of relatives, not to represent a causal 
mechanism. Among the simulated families, we kept those with 
at least a cousin pair with Y2 = 1, i.e., affected by the disease to 
mimic the ascertainment process of families in a genetic study. 

We simulated two scenarios of population origin of the sam- 
ple: (1) homogeneity: the sample came from a single population 
where the phenotypes were generated under the polytomous 
model presented in Table 1. Under this models and with the 
above RAFs, the disease had a population prevalence of 0.0076 
and the endophenotype impairment a prevalence of 0.128; (2) 
heterogeneity: the sample was a mixture of families from two 
populations, both represented in equal proportions. In popula- 
tion 1, all intercept coefficients in Table 1 were reduced by 0.5, 
while in population 2 they were increased by 0.5. This resulted in 
disease prevalences of 0.005 in population 1 and 0.012 in popu- 
lation 2, and endophenotype impairment prevalences of 0.082 in 
population 1 and 0.194 in population 2. 

To verify the Type I error of tests of association to locus 2 under 
the nuU hypothesis of no association to locus 2, but in presence 
of genetic linkage at that locus, we generated an additional bial- 
lelic variant at locus 2 independent from the causal variant at that 
locus, i.e., in linkage equilibrium with it. In the homogeneous 
population, the minor allele frequency of that marker was equal 
to the RAF of the causal variant, but in the mixture of two popu- 
lations the minor allele frequency was 0.1 in population 1 and 0.5 
in population 2, creating population structure at that locus. For 
the power evaluation, we tested association to the actual causal 
variant at locus 2. 

The tests evaluated include the tests of the nuU hypotheses 1 1 
which we denote "cpoly," 13 which we denote Pa)" and 
(633 = 0. We also evaluated a single locus polytomous model 



Table 1 | Regression coefficients of the example polytomous model. 



Coef 


Value 


Coef 


Value 


Coef 


Value 


Coef 


Value 


^10 


-2 


/ill 


log (2) 


^12 


0 


/5l3 


-log (2) 


P20 


-5.5 


fti 


0 


P22 


0 


^23 


0 


fto 


-5.5 


fti 


0 


P22 


0 


^33 


log (16) 



(model 11 with X2 only). The coefficients in that model are 
labeled /i(li), and we tested the null hypotheses y8(ll) = 0 as 
well as ySsClL) = 0. For the evaluation of the Type I error, Wald 
tests of the coefficients of the one locus model based on GEEs 
were also performed. However, these tests were not used for the 
power comparison, since they had inflated Type I error under 
our heterogeneity scenario where population stratification was 
present. 

In presence of population stratification, previously available 
valid tests are restricted to a dichotomous outcome and a single 
marker. Analysis options are then limited to testing association of 
a single marker to the dichotomous endophenotype 7i and dis- 
ease status Y2, either in the full sample or, in the case of 72> in a 
stratum defined by Yi . This is akin to the strategy for detecting 
modifier genes conferring susceptibility to a specific phenotype 
(i.e., the disease) consisting in testing association to the specific 
phenotype among subjects with a broader phenotype (i.e., the 
endophenotype impairment) (Bureau et al., 2012). We therefore 
compared the power of various tests derived under our exten- 
sion of the GDT against the single marker GDT for dichotomous 
outcomes applied to the locus 2 causal variant with three phe- 
notype definitions: (1) the disease status Y2 (standard analysis 
noted simply GDT), (2) the disease status Y2 in the subset of 
subjects with Y\ = \ (endophenotype impairment), setting the 
phenotype of other subjects to unknown (GDTc), and (3) the 
endophenotype status Y\ (GDTe). We also compared our tests to 
score tests of coefficients of the usual two-locus logistic model for 
a dichotomous trait: 

, {P[Y =\\Xi,X2\\ , , 

l°g( p[y^O|x| X2] ) " ^ '^'^^ ^ ^ ^^^^ 

The 2 d.f test of the null hypothesis 772 = '?3 = 0 is denoted "cdis- 
ease" when the phenotype tested is Y2 and "cendo" when the 
phenotype tested is Y\ . 

3. RESULTS 

3.1. EVALUATION OF THE TYPE I ERROR 

The Type I error was evaluated on 1000 replicate samples of 100 
families. The results of the simulation under the null hypothesis 
in Table 2 show that the nominal Type I error rate was respected 



Table 2 | Estimations of Type I error on 1000 replicate samples of 100 families. 



GEE 








Conditional lil<elihood 




Single locus 




Single locus 






Given other locus" 






g(U) 






^33 


(^a,^33) 


cpoly 




HOMOGENEOUS POPULATION "^^^^W "^^^^^hmh^^ 


a = 0.01 : 0.009 
a = 0.05 : 0.053 


0.015 
0.060 


0.006 
0.051 


0.012 
0.045 


0.001 
0.019 


0.001 
0.003 


0.007 
0.029 


MIXTURE OF TWO POPULATIONS ^^^V^ ^^^^^^^^^^^m ^^^^^^^^^^^^^^H 


a = 0.01 : 0.102 
a = 0.05 : 0.237 


0.762 
0.906 


0.012 
0.048 


0.010 

0.053 


0.002 
0.025 


0.001 
0.003 


0.007 

0.033 



^ subject pairs were weighted using expression 6. 
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P33 (Pe.Pss) cpoly cdisease cendo 



FIGURE 2 I Power of various within-family score tests to detect locus 2. 

See text for definitions of the acronyms of the tests. For tests conditional 
on another locus, subject pairs were weighted using expression 6. 



under both scenarios for all test statistics from our polytomous 
extension of the GDT. The Type I error rates of the tests condi- 
tional on locus 1 were similar for weight definitions 6 and 7, so 
only results for the former are shown. They were both below the 
nominal level, making these tests conservative. By contrast, the 
Type I error of the Wald tests based on GEE estimates were at 
nominal level only under the homogeneous sample scenario, and 
were severely inflated under the heterogeneous sample scenario. 

3.2. EVALUATION OF THE POWER 

Under the simulated scenario the endophenotype-to-disease 
model holds. While the test of the null hypothesis 13 has some 
power, testing /^33 = 0 (the interaction parameter for the com- 
bination of disease and endophenotype impairment) achieves 
the highest power among the tests considered (Figure 2). Using 
weight definition 7 instead of 6 led to nearly identical power 
(results not shown). Under this scenario, testing association for 
the same phenotypic category of the allele count at locus 2 
^63(11) = 0 or the entire vector jS(lL) = 0 does not provide a 
measurable power improvement over the GDT applied to the dis- 
ease status in the subset of subjects with endophenotype impair- 
ment. Further comparisons of testing strategies under a variety of 
scenarios will be reported elsewhere. 

3.3. APPLICATION TO MAJOR PSYCHOSIS AND VISUAL EPISODIC 
MEMORY 

Schizophrenia (SZ) and bipolar disorder (BP) are two forms 
of the spectrum of major psychosis (MP), which also includes 
schizo-affective disorder. SZ and BP co-aggregate in fam- 
ilies (Van Snellenberg and de Candia, 2009), and share 
genetic liability (Cross-Disorder Group of the Psychiatric 
Genomics Consortium, 2013). Various cognitive domains are 
widely recognized as endophenotypes of MP (Bora et al., 2009; 
Ivleva et al, 2010). In the Eastern Quebec kindred study, visual 
episodic memory (VisEM) was found to be impaired in both 
SZ and BP patients and non-affected adult relatives of these 
patients (Maziade et al., 2011). In that same family sample, we 
recently replicated an association between the T allele of SNP 
rs 1156026 and SZ that we had previously detected in another 
sample (Bureau et al., 2013). AH the elements required for the 
application of our extension of the GDT to markers genotyped in 
the family sample are present: a diagnosis within the spectrum of 
MP as the disease phenotype, a VisEM mesurement dichotomized 
as presence/absence of deficit as the endophenotype and the SNP 
rsl 156026 as the established risk locus. Given the small number 
of subjects with cognitive measurements, this analysis is not suf- 
ficiently powered to draw conclusions and must be considered 
illustrative. The small sample size also limited us to an analysis 
of MP globally, without separating SZ and BP. 

VisEM was measured by the performance on the delayed recall 
of the Rey figure task (Meyers and Meyers, 1995) defining the 
affected status as being the 4th percentile of the distribution 
of age and gender matched controls. We retained the 14 infor- 
mative families defined as containing at least one MP affected 
subject with a visual memory measurement and subjects in at 
least one other phenotypic category. Table 3 presents the joint 
distribution of MP and VisEM in the 133 genotyped subjects 



from these families along with the frequency of the rsl 156026 T 
allele. Although the frequency of the T allele is greatly increased 
in subjects with MP and the VisEM impairment compared to 
normal subjects (and this increase is statistically significant in a 
population-level comparison) the within-family score test of the 
corresponding coefficient has a high ^i-value, suggesting that the 
difference in T allele frequency is mostly between families and not 
so much within families. 

We tested association to 80 SNPs in genomic regions where 
genetic linkage to SZ, BP, or MP was previously detected in that 
family sample on the p arm of chromosomes 6, 8, and 16 and 
the q arm of chromosomes 12 and 18 (Maziade et al., 2005). We 
applied the same tests as in the simulation study. SNPs where a 
p-value < 0.05 was obtained in at least one analysis are shown in 
Table 4. 

The results for rs7500550 illustrate that tests of the joint MP- 
VisEM phenotype conditional on the rsl 156026 T allele count can 
detect associations to SNPs where the test of the MP or VisEM 
phenotype alone did not. In this case, the rare allele was nega- 
tively associated to MP with VisEM impairment with Z statistics 
of -2.66 for the X2 and -2.34 for the terms (p = 0.0019for 
the test of the coefficients of both terms) while it was positively 
associated to a lesser extent to MP without VisEM impairment 
with Z statistics of 2.54 for the X2 and 2.07 for the X1X2 terms 
{p = 0.005 for the test of the coefficients of both terms). The 
signal was thus driven by opposite associations to these two phe- 
notypic categories. The signal at rsl087266 was detected by single 
locus tests with lower p-values than by tests conditioning on 
rsl 156026. In that case, testing association with VisEM status was 
the key to detect the signal. Nonetheless, the conditional test of the 
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Table 3 | Joint distribution of major psychosis and visual episodic memory deficits along with the frequency of the rs11 56026 T allele. 







VIsEM <= 


4th perc 






VisEM 


> 4th perc 




Total 






"1 


Freq T 


Pgee^ 




"0 


Freq T 


Pgee" 


Pu" 


n 


Freq T 


MP Yes 


21 (41%) 


0.52 


0.0011 


0.34 


30 


0.40 


0.040 


0.310 


51 (38%) 


0.45 


No 


13 (16%) 


0.31 


0.97 


0.72 


69 


0.30 






82 (62%) 


0.30 


Total 


34 (26%) 


0.44 






99 


0.33 






133 


0.36 



"p-values of Wald tests of the coefficients of the one locus poiytomous modei estimated using generalized estimating equations (GEE), 
^p-values of within-family score tests of the coefficients of the one locus polytomous model. 



Table 4 | Results for SNPs where a p-value < 0.05 was obtained In at least one analysis^ 



SNP 


Chr 


Pos (Mb) 






MAF (n) 














Vi = 0 , = 0 


y, = 0, = 1 


Vi = 1, ^2 =0 


Vi = 1, V2 = 1 


rs1087266 


6 


24.4 




0.39 (42) 


0.26 (25) 


0.60 (5) 


0.55 (19) 


rs7500550 


16 


19.1 




0.11 (41) 


0.16 (25) 


0.17 (6) 


0.03 (18) 


TESTS p-VALUES 
















SNP 


GDT 


GDTc 


GDTe 


g(U) 


^33 


We, ^33) 


cpoly 


rs1087266 


0.48 


0.25 


0.005 


0.005 


0.032 


0.085 


0.006 


rs7500550 


0.52 


0.17 


0.57 


0.040 


0.019 


0.064 


0.015 



"For tests conditioning on rsn56026 genotypes, subject pairs were weighted using expression 6. 



polytomous phenotype provides a p-value similar to the standard 
GDT. Given the limited power of the analysis and the number 
of SNPs tested, these results cannot be considered statistically 
significant once multiple testing is taken into account. 

4. DISCUSSION 

We have extended the GDT, a score test of genetic association 
applicable with extended families, to enable testing association 
with a polytomous phenotype. Another extension is the use of 
a model of association with two genetic loci, allowing to test asso- 
ciation at a locus conditional on the genotype of a marker at a 
known risk locus, to exploit interaction between the two. A soft- 
ware implementation in the form of a R package has been made 
freely available. The within-famUy analysis framework that we 
adopted has the advantage of protecting against Type I error infla- 
tion due to population stratification. Polytomous phenotypes can 
be more informative than dichotomous ones to detect genetic 
associations, as illustrated in our simulation study. 

The proposed score tests also suffer from limitations. First, 
score tests provide no estimates of the regression parameters 
being tested. Conditional maximum likelihood estimation would 
be applicable only with exchangeable relatives, which is not 
required for the GDT as explained in Supplementary Material. We 
are exploring the robustness and power of conditional maximum 
likelihood estimation in sibships from extended families. 

Second, within-family analysis tends to be less powerful than 
population-level analysis which also exploits between family 
information. Furthermore, the Type I error of score tests for 
one locus conditionning on another tends to be conservative 
even with the weight definition 6 neglecting variability from 
estimating the Our simulation studies illustrate that power 



remains limited despite large sample sizes (1600 subjects in 100 
families) and large effect sizes (interaction odds ratios of 16). 
Extracting the most power from the data is particularly impor- 
tant when phenotypic measures are expensive to obtain, such as 
the cognitive measurements in our example. Population analyses 
are then attractive, with an adjustment for population structure 
using genomewide SNP genotypes (Price et al., 2006). Methods 
for population analysis of polytomous phenotypes are not well 
developed, and will be the object of future work. 
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